Word Clustering for Collocation-Based Word Sense Disambiguation

نویسندگان

  • Peng Jin
  • Xu Sun
  • Yunfang Wu
  • Shiwen Yu
چکیده

The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the Fmeasure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation

In this paper a novel solution to automatic and unsupervised word sense induction (WSI) is introduced. It represents an instantiation of the ‘one sense per collocation’ observation (Gale et al., 1992). Like most existing approaches it utilizes clustering of word co-occurrences. This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation obs...

متن کامل

Kim, Su Nam and Timothy Baldwin (to appear) Word Sense Disambiguation and Noun Compounds, ACM Transactions on Speech and Language Processing

In this paper, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and lexical substitution to build supervised and unsupervised word sense disambiguation (WSD) classifiers, and show our unsupervised learner to be supe...

متن کامل

The Sense Boundary Decision and the Sense Labeling from Collocation Clustering

This paper discusses the deciding practical sense boundary of homonymous words. One of the serious problems in making dictionaries or thesauri is in the vague boundary of senses. This also becomes a bottleneck in sense disambiguation for practical language processing systems. This paper proposes a deciding method for sense boundary discovery of homonyms using collocation from large corpora and ...

متن کامل

One Sense per Collocation and Genre/Topic Variations

This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre a...

متن کامل

Integrating Collocation Features in Chinese Word Sense Disambiguation

The selection of features is critical in providing discriminative information for classifiers in Word Sense Disambiguation (WSD). Uninformative features will degrade the performance of classifiers. Based on the strong evidence that an ambiguous word expresses a unique sense in a given collocation, this paper reports our experiments on automatic WSD using collocation as local features based on t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007